Feasibility Study

Singapore, Hong Kong & Malaysia

Csaba Pusztai

Overview

The purpose of this study is to assess the feasibility of extending the production of aggregate statistics from Xero customer data in countries with lesser market penetration compared to main Xero markets (e.g. Australia, New Zealand or the United Kingdom). This analysis focuses on Singapore, Hong Kong and Malaysia as potential new additions. The following questions are addressed:

1. What indicators can be measured (What data is available)?

2. How accurately can we measure indicators?

3. What level of analysis is possible?

What indicators can be measured?

Aggregate business and economic statistics are mostly drawn from common transactional data (such as invoices) and reconstructed general ledger data. Currently, the only exception are indicators related to employment and jobs, which are derived from transactional payroll data. As Xero Payroll is an add-on module only available to customers in Australia, New Zealand and the United Kingdom, it means that any metric drawing on this data source is naturally confined to this geographical scope.

Metric theme Singapore Hong Kong Malaysia
Invoicing/payment ✓ ✓ ✓
Cash flow ✓ ✓ ✓
Revenue ✓ ✓ ✓
Jobs/wages n/a n/a n/a

How accurately can we measure indicators?

Given we are using Xero customer data to tap into broader economic trends, it is important to understand how confident we can be in generalising our results over the target population of small businesses. The closer our measurements are to the true value of the various indicators, the more valuable they are for analytical and decision-making purposes. Unfortunately, the true value is usually unknown, so we can only have an estimate of the accuracy (or the lack thereof). Generally speaking, the following aspects of accuracy need to be evaluated:

  1. Selectivity. Xero customer data is not a result of random sampling design, but is organically generated. This inherent self-selectivity may be a source of bias. For example, some industries are typically under-/over-represented in the Xero dataset compared to the real population. Any known biases need to be removed (corrected for), if feasible. This current analysis accounts for industry bias.
  2. Margin of error. We use a sample to estimate the value of an indicator at the population level, therefore we need to understand the likeliness of being reasonably close to the true value of the indicator.
  3. Precision. Are our indicator measurements reproducible?

The next section displays estimates for SBI indicators. The estimates are bias-corrected (accounting for variability in industry representation). The charts show the mean estimates and 95% confidence intervals for the estimates. (Precision in our case would mostly involve looking into reproducing measurements over time as the dataset evolves, which is likely to follow a similar pattern to established markets.)

Example #1

Getting Paid

The headline Getting Paid metric for SBI estimated the average time it took for a credit sales invoice to get paid (the number of days elapsed between the day of invoicing and the day of full payment). For illustrative purposes, the charts below show the metric for 30-day invoices with New Zealand displayed for comparison.1

New Zealand estimates for Average-Days-To-Pay curently come with a 2.5% margin of error as a result of having a substantial part of the small business economy on Xero. None of the three candidate countries currently come anywhere close to this level of accuracy. Singapore, the closest, has more than three times the margin of error (~7.7%), whereas Malaysia is way out of the league with 18%. This level of (in)accuracy renders these meaures inadequate for solid analysis for the time being.

Example #2

Cash flow

The headline cash flow indicator for SBI estimates the proportion of small businesses that turn out to be cash flow positive in a given month.

This proportion is somewhat easier to estimate given the smaller difference across groups (industries). Also, the sample for this indicator contains more firms than the ‘getting paid’ sample as it captures a broader aspect of business. The margins of error are substantially smaller for all three candidate countries compared to those of the cash flow metric, but accuracy is still far from that of the New Zealand metric.

What level of analysis is possible?

As clear from the currently achievable levels of accuracy, none of the three countries has the data that would support meaningful analysis as of now, but as Xero’s customer base grows they may cross the line, Singapore being the strongest candidate, whereas Malaysia is the weakest.

The following table summarises how far in time certain types of analysis may become possible assuming that Xero’s customer base in these countries follows the growth trend of the last few years.

Type of analysis Singapore Hong Kong Malaysia
Country level 1 yr 2 yrs 3 yrs
Cross industry 2 2 yrs 3 yrs 4 yrs
Cross geographies 3 ? ? 4 yrs
Longitudinal 4 6 yrs 7 yrs 8 yrs

Endnotes

  1. The confidence interval is the range of plausible values for the true value of the indicator we are trying to estimate at a given level of confidence. The 95% level chosen for this study means that in the long run (~assuming repeated sampling), the true value of the metric would fall within the calculated range 95% of the time.
  2. Official statistics on the industry distribution of businesses are sparse or not available at all. In addition, industry classification is not directly comparable with ANZSYS06, currently being used as default in XSBI.
  3. In the case of Singapore and Hong Kong, it may not be worthwhile to provide geographical breakdowns given the small size of the countries. Also, offical statistics on the geographical distribution of businesses are sparse or not available at all which makes it challenging to estimate accuracy (or determine appropriate sample sizes or weights).
  4. Historical (multi-year) data is critical for the accurate extraction of seasonal/trend patterns and any analysis aimed at detecting/describing change over time.